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O , Abstract 

o 

^ ' Despite recent molecular technique improvements, biological knowledge remains incomplete. 

^ , Reasoning on living systems hence implies to integrate heterogeneous and partial informations. 

' Although current investigations successfully focus on qualitative behaviors of macromolecular 

networks, others approaches show partial quantitative informations like protein concentration 
variations over times. We consider that both informations, qualitative and quantitative, have to 
be combined into a modeling method to provide a better understanding of the biological system. 
We propose here such a method using a probabilistic-like approach. After its exhaustive descrip- 
tion, we illustrate its advantages by modeling the carbon starvation response in Escherichia coli. 
In this purpose, we build an original qualitative model based on available observations. After 
the formal verification of its qualitative properties, the probabilistic model shows quantitative 
results corresponding to biological expectations which confirm the interest of our probabilistic 
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, approach. 
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1 Introduction 

> 
O 
O 

. The last decade has seen great successes in macromolecular network modeling. In particular, 

^ I qualitative methods appear today as well-adapted for reasoning on biological systems, despite the 

CN ' current lack of quantitative informations (de Jong, 2002). Thus most of interesting and investigated 

knowledges concern local informations such as gene- gene or gene-protein interactions. They allow 
i to build networks like on Figure 1 (A), that model the global qualitative behavior of a biological 

system. However, other experiments illustrated Figure 1 (B) give insights about various partial 
quantitative knowledges. They emphasize both molecular concentration variations and time-series. 
These two related kinds of partial quantitative information, i.e., time and concentration, are well 
I studied by other experiments (Wolfe, 2005) and reflect as well the overall system behavior. Both 

informations, qualitative and quantitative, have hence to be combined into a modeling method 
for giving a better understanding of the biological system. Due to the lack of quantitative in- 
formations, we propose a modeling approach that (i) spreads partial local informations through 
the qualitative network and (ii) gives insights about global behaviors. Probabilistic approaches 
are well adapted for bringing complementary quantitative or semi quantitative knowledges into a 
qualitative model. Among them, we suggest an original toll based approach that predicts various 
molecular productions combining both qualitative and partial quantitative knowledges. After an 
overview of our probabilistic approach (Sec. 2), we propose here to apply it on gene regulatory 
model of the carbon starvation response in Escherichia coli. In this purpose, we (Sec. 3.1) build 
a model based on a novel qualitative abstraction, validate its behavior using a formal verification 
approach, which (Sec. 3.2) allows us to accurately apply our probabilistic method. Such a protocol 
emphasizes several biological insights of interest. 
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Figure 1: Biological informations concerning Escherichia coli carbon starvation system. (A) 
represents interactions between genes involved in the regulatory network (adapted from 
(Ropers ct al., 2006)). (B) shows quantitative variations of macromolcculcs of interest (based on 
(Ball et al., 1992)). Note the linear relationship between fis RNA and Fis protein productions. 



2 Method 

2.1 Biological system formalization 

We consider biological networks as graphs that show transitions between various components of 
the system. Each transition is related to variations of characteristic quantities of the system and 
produces its own impact on the whole system behavior. In a gene regulatory network, a qualitative 
graph arrow is associated with a production or consumption of the corresponding protein. 

In order to abstract qualitative biological behaviors, we represent a gene regulatory network by 
a qualitative graph where each state stands for a qualitative variation of a gene activity. We 
focus on the macromolecular transformation derivative, which is more tractable to model detailed 
macromolecular concentration variations. As illustration, following interactions describe the fact 
that (i) gene x activates gene y and [ii) x represses y: 

(i) X — > y'^ (ii) X — > y~ 

Such a representation implies that gene x produces protein X that activates gene y. Thus (z) and 
(ii) represent respectively an overall increase of Y protein production and an overall decrease of 
Y. Note that such an abstraction neglects post-transcriptional regulations which is particularly 
unappropriated for modeling eukaryote gene regulatory network. 

This biological abstraction allows us to model various qualitative interactions. Considering that a 
gene x activity is summarized by two qualitative states x^ and x~ , y activation by x might be 
described by the set of rules and its corresponding transitions: 

{x+ =^ y+}A{x- =^ y+} 

A peak of gene x activity that activates y is represented by: 

{x+ 2/+} A {x' ^ 0} 
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A minimal activity of gene x that activates y is symbolized by: 

{x+ 0} A {.X" =^ y-} 

Gene y repressions by a gene x activity are modeled using similar rules that imply transitions 
toward y~ . Such an abstraction gives the opportunity to focus on qualitative behaviors. Reasoning 
on quantities associated with qualitative rules allows us to emphasize quantitative states of the 
system despite concurrent qualitative rules. 

2.2 Graph model and quantities 

We make the assumption that the biological system is associated with several quantities qi, . . . ,qk 
that represent the current state of the system. For illustration, these quantities represent protein 
concentrations, or other non trivial quantities such as the number of times a particular pathway is 
taken by the living system. Studying the behavior of biological systems hence consists in under- 
standing the evolution of these quantities. Note here that such quantities may not be experimentally 
measurable. Since the last decade, biological behaviors have been often described by qualitative 
graphs that abstract different component variations within the system. In our model, we consider 
that each transition of this qualitative graph implies a potential variation on each quantity. Here 
we propose a method that focusses on these quantities. 

We consider two types of quantities. Some quantity variations are additive whereas others are 
multiplicative, (i) Each transition from i to j is associated with a real number the quantity 

q is additive if the quantity q = x before the transition becomes x + after the transition, (ii) 
Each transition from i to j is associated with a strictly positive real number A(jj), the quantity 
q is multiplicative if the quantity q = x before the transition becomes 3;A(jj) after the transition. 
Each quantity q is thus associated with a matrix Cg in which the element at position is the 

contribution of the transition from i to j. We are looking for understanding the typical behavior 
of given additive or multiplicative quantities after a given time. These behaviors are controlled by 
an accumulation of small contributions. For illustration, we consider the following graph. 




We consider two distinct quantities qi and q2. Their associated cost matrices are respectively Ci 
and C2. qi counts the number of times that transition c — > 6 is taken. q2 models the concentration 
of a product. It increases by 20% for every transitions pointing to b and decreases by 10% for 
all other transitions {i.e., abstraction of the product natural degradation). Note here that, by 
convention, a cost of (or respectively 1 for multiplicative quantities) has been assigned to the non 
existing transitions 6 — > c and c — c. Thus, as illustration, given initial quantities qi = 0, q2 = 1 
and a trajectory, abacbacacba, their values become qi — 2 and q2 — 0.826. 
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2.3 Probabilistic model 



On applicative purpose, we are interested in the values of all quantities for a random trajectory. 
Transitions impact differently to the global system behavior. We assume here that each transition 
possesses its own probability. Thus, at each step, one chooses randomly between all the transitions 
that leave the current state. The sum of the probabilities associated with all edges that leave a 
given state is 1. 

For fixed probabilities at all steps, this model is a weighted Markov chain. Nevertheless, prob- 
abilities may vary, showing a behavior controlled by a dynamical system (see (Vallee, 2001) for 
a further details about dynamical sources). This model is hence quite general and particularly 
accurate for theoretical studies since it includes simple probabilistic models such as Markov chains, 
Hidden Markov chains or trickiest models that handle unbounded correlations {i.e., the choice 
made at one step influes on all next choices). In this last case, generating operators play the role of 
transition probabilities. For this reason, we assume our model as a graph with dynamical sources 
(or GDS model). The GDS is called nice if it satisfies some classical conditions of the theory of 
Markov chains and dynamical sources (namely, the graph is strongly connected and aperiodic and 
all the dynamical systems are topologically mixing and possess expansive branches). We consider 
the transition matrix T = {tij) of the qualitative graph in which the element («, j) is the generating 
operator relative to the transition from i to j. Reasoning on system properties implies to focus on 
quantities asymptotic properties. These mathematical properties are well studied in both theories 
of Markov chains and dynamical sources (Bourdon and Vallee, 2006). 

2.4 Typical behaviors 

Previous theoretical assumptions allow us to emphasize typical characteristics of quantities. More 

precisely, for a given GDS model, wc provide results for the mean, the variance and the limit 
distribution. The following theorem synthesizes our results. 

Theorem 1 Let M. hy a nice GDS model with transition matrix T and q a quantity with 
cost matrix C. Let Qn be the random variable equal to the quantity q after n steps of the 
GDS model M. 

(i) if q is an additive quantity, Qn follows asymptotically ( when n tends to oo) a Normal 
law with mean and variance 



where ai = A'(l) and = A"(l) + A'(l) — A"(l)^ express by means of derivatives of 
the dominant eigenvalue of the matrix A{u) defined by Aij{u) = TijU^^^i . 

{ii) if q is a multiplicative quantity, Qn follows asymptotically (when n tends to oo) a 
log— Normal law with mean and variance 



E[Qn] = ain + 0{l) 



Var[Q„] = a2n + 0{l), 



E [Qn] = /?i 7r + o(An 



Var[Q„]=/?2 72 +o(A2), 
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where 71 — A(e) and '^2 = max(A(e^), A(e)^) express by means of the dominant eigen- 
value of the matrix A{u) defined by A.j.j('u) = Tij-u''^*"'-^ . Pi and /?2 are constants 
corresponding to the dominant eigenvectors o/A(e) and A(e^). The error terms Ai and 
A2 verify Ai < 71 and A2 < 72. 



Sketch of proof. See (Bourdon and Eveillard, 2007) for a complete proof of this theorem. Consid- 
ering the additive case is sufficient, if g is a multipficative quantity, then log q is an additive one 
and it is easy to obtain the results of (ii). The study involves several classical elements on the 
average-case analysis theory such as generating functions. Let m be the number of states of the 
GDS model and Pq be a probability vector whose element i is the probability that initial state is 
state i. Since we consider asymptotic cases, this initial vector does not have any influence on the 
result. The generating function Q(z, u) defined as 



n>0 



permits to study the quantities of interest. Indeed, 

d 



du 



1 



u=l- 



For a nice GDS model, the matrix A(m) admits a dominant eigenvalue in a neighbourhood of 
u — 1 and decomposes as A(u) — X(u)¥(u) + N(u), where X(u) is the dominant eigenvalue, ¥(u) 
is the dominant eigenvector and N(n) is associated to the remainder of the spectrum (and is thus 
orthogonal to F{u)). Consequently, for large n, one has 

A(^x)" A(w)"P(?/). 

It is easy to obtain a formula for the mean. The study of the variance follows similar assumptions 
and involves the second derivative of Q(z, u). Finally, the limit law is obtained by applying Hwang's 
(Hwang, 1996) general result on bivariate generating functions. □ 



Supplementary results have been obtained but they are not detailed here. Among others, we cal- 
culate the probability for a quantity to attain a given threshold before a given time t (it generalizes 
the hitting probability, common in the Markov chain theory) and the joint law of several quanti- 
ties. Most on our computations extends in same cases when the graph is not strongly connected or 
aperiodic. 



2.5 A typical biological study 

Previous theories allow us to reason on system quantitative properties but provide as well the 
core of a dedicated software^. This software works on GDS models with fixed probabilities and 
represents an accurate tool for simulating macro-molecular networks. As inputs, it needs a graph (or 

^POGG: Probabilities On Genetic Graplis is available at http: //www. sciences .univ-nantes . f r/lina/bioserv/POGG/ 
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a qualitative graph in a better case) and the cost matrix of quantities of interest. GDS probabiHties 
are unknown or partiahy unknown which make almost impossible to predict the quantitative impact 
of an interaction on the system behavior. Nevertheless, experimental results of quantity behaviors, 
like protein concentrations, are known. POGG uses such an information and adopts a reverse 
engineering point of view. Previous theoretical results give some (in) equalities that relate unknown 
significance probabilities to experimental measures (of part or all of the quantities). POGG uses 
general techniques of local search theory, such as Tabu search, for estimating the impact of a 
local interaction in the whole biological system. The determined model gives us the opportunity 
to predict the behavior of others quantities. Note that the software also provides supplementary 
informations such as an approximation of the hyper- vohimc of models that are consistent with the 
measures. This information helps to decide whether a new measure is informative or not, using a 
simple comparison between different volumes. 



3 Results 



(Ropers et al., 2006) models the growth phase transition of a bacteria after a nutritional stress. 
In particular, the model shows the abandon of exponential growth state to a more stationary 
growth during a carbon starvation stage. Their qualitative results are relevant with experimental 
knowledges, which allows us to consider the model as an appropriate benchmark for our modeling 
approach. Furthermore, macromolecules that interact within the model are well studied. It gives 
us various partial quantitative informations that have to be introduced into the qualitative model. 



3.1 Carbon starvation response in Escherichia coli: gene regulatory network 
and qualitative rules validation 

We consider similar hypotheses to those exposed in (Ropers et al., 2006) and propose a new graph 
that represents identical qualitative behaviors of bacterial responses after a nutritional stress. For 
illustration and using abstractions described in Sec. 2.1, we detail in Figure 2 one particular biolog- 
ical component: crp gene. The gene crp is controlled by two promoters that are both repressed by 
Fis protein (Gonzalez-Gil et al., 1998). Following assumptions from (Ropers et al., 2006), we omit 
the negative control of crp and summarize the impact of cAMP metabolite using rules that imply 
Cya and Crp protein and carbon starvation signal as well (Harman, 2001). 



Fis Cya Signal 




Figure 2: Qualitative representation of crp interaction with others genes and carbon starvation 
signal. 1 represents the repression of crp by Fis protein production. 2 and 5 are transitions 
for the basal synthesis rate which plays an important role during the exponential growth phase. 
Combination of concurrent rules 3 and 4 synthesizes the crp activation via cAMP metabolite. 
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Figure 3: Qualitative graph representing genes regulatory network of carbon starvation response 
in Escherichia coli. Signal represents the input module that indicates carbon starvation condition. 



We use a similar approach for describing each biological component of the gene regulatory network. 
Figure 3 represents the corresponding qualitative graph. Our aim is to demonstrate advantages 
of our probability approach. Therefore we will not detail here biological assumptions that have 
been used for building the model. See (Ropers et al., 2006) for exhaustive hypotheses. Before 
further in silico investigations, the model has to be validated. It is the sine qua non condition 
for applying a probabilistic approach. Although probabilities can be estimated using an appro- 
priate optimization, our confidence in such parameters is related to the ability of the model to 
reproduce appropriate qualitative behaviors of the biological system (i.e., various kinds of qual- 
itative models can produce similar quantitative results). Using the symbolic model-checker of 
BIOCHAM (Calzone et al., 2006, Fagcs et al., 2004), we thus check qualitative rules in order to 
verify their consistency with experimental understandings. In particular, (Browning et al., 2004) 
shows an antagonistic relationship between fis and crp activities. For validating the model, 
we are able to ask positive queries {i.e. queries where the expecting answer is true) such as 
{fis'^ A ) =^ (crp~ A -icrp'^). We are as well able to ask negative queries (i.e. queries 

where the expecting answer is false), such as {fis~^ A ~'fis~) = {crp~ A -icrj?"*"). Using this formal 
verification on the qualitative model, we successfully check other biological properties like the re- 
lationship between the carbon starvation signal and crp expression (Ishizuka et al., 1994) as well 
with cya activities (Ball et al., 1992). 

3.2 Probabilistic results 

Therefore, we have at our disposal an accurate qualitative graph (Figures 3) and quantitative 
informations (Figure 4 (A)) that belong to the same bacterial system. Our modeling approach 
exploits such informations and predicts probabilities on graph transitions using a local search 
algorithm. In practice, we take into account the fact that Fis concentration is multiplied by 10 in 
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Figure 4: Summary of informations used and produced by the probabilistic approach. (A) shows 
variations of Fis and Cya protein concentrations in function of growth phases (Ball et al., 1992, 
Notley-McRobb et al., 1997). Two Fis variations during stationary growth have been used for 
estimating probabilities associated with qualitative transitions from Figure 3. It allows to reproduce 
quantitative behaviors of Cya and Fis during both growth phases (B). 



80 minutes during the stationary growth phase. We assume Fis concentration qp^s as a multiplicative 
quantity (see Sec. 2.2). Therefore it increases by 20% for each transition pointing to fis^ , decreases 
by 20% for all transitions pointing to fis~ and decreases by 10% for natural degradation passing 
through all other transitions. We estimate the Fis quantity at time 80, qpis ~ 10 ■ 7^^°° with 
7 1.001 (1600 corresponding to the number of steps performed by the model during 80 minutes, 
this number is established by considering Cya natural degradation during the first 2 minutes). 
Comparing this numerical value with constants from Theorem 1, we get a constraint that relates 
probabilities with a measure on the system. Local search methods allow to find a suitable probability 
matrix used for simulations. 

Figure 4 (B) shows the estimated variation of Cya and Fis protein in function of growth phases. 
During the stationary growth, our model accurately predicts a decrease followed by an increase 
of Fis protein production (Ball et al., 1992). It emphasizes the ability of our approach to spread 
partial quantitative knowledges through the qualitative network. Despite a quantitative estimation 
using two measures during the stationary phase, interestingly, our model predicts efficiently the Fis 
concentration decrease during the exponential phase. This model artifact represents a quantitative 
emerging property of the biological system which gives insights about global behaviors. 

Estimative Cya protein variations are as well consistent with experiments during stationary phase. 
However, despite an appropriate increase during the beginning of the exponential phase, the Cya 
production does not follow an expected peak (Notley-McRobb et al., 1997). It mights reflect a 
shortcoming or a missing qualitative transition that represses the cya gene. We consider such an 
information as a guidance for future models or further experiments that might focus on cya gene 
regulations. 

A close attention to estimated probabilities gives results that are related with the quantitative 
sensitivity of the model. More precisely, an estimation of the hyper-volume associated with the 
model emphasizes whether a new measure is informative or not. Our model shows that the prob- 
ability associated with topa^ and fis" transition is highly constrained in order to maintain an 
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overall consistency between heterogeneous informations. This transition is a shortcut adapted 
from (Ropers et al., 2006) for representing DNA supercoiling effect on fis gene expression. Exper- 
iments suggest that fis is involved in fine tuning of the homeostatic control of DNA supercoiling 
(Schneider et al., 2000). A small change in DNA supercoiling drastically affects the fis expression. 
This information is accurate with our estimative impact of this transition on the global system 
behavior. 



4 Discussion 

Recent fruitful probabilistic approaches has been developed for studying gene regulatory networks 
(Shmulevich et al., 2002, Zhou et al., 2004, Kim et al., 2002). These approaches add probabilities 
to an already defined deterministic model. It gives the opportunity to study probability varia- 
tion impacts and eventually to determine probability sets that accurately represent experiments. 
Knowing the transition probability graph, the major issue of these approaches is to compute the 
asymptotic (stationary) distribution and to reason on it. 

Our original method appears as a complementary approach that adds new natural informations in 
a general probabilistic graph. It gives the opportunity to reason on emerging system properties 
by focusing on asymptotic properties of the probabilistic model. We prove that their asymptotics 
arc related to natural constants on a weighted transition matrix. The proposed method allows to 
design constraints between probabilities and observations, which gives the opportunity to deal with 
unkwown transition probabilities. Therefore our results are adapted to a large class of probabilistic 
models and their integration within a more general framework such as PBN and Bayesian networks 
seems promising. 

The number of biological details at disposal defines the model abstraction level which conduces 
to choose an accurate biological abstraction. It is more or less discrete in function of the number 
of qualitative states. Our probabilistic-like technique is able to combine quantitative informations 
with various qualitative abstractions of biological systems, i.e., from boolean to PDE network 
(de Jong, 2002). Therefore, our method emphasizes a convenient flexibility for analyzing biological 
systems because it presents major advantages for integrating heterogeneous knowledges such as 
those that constitute the Escherichia coli starvation system. 

During this study, various biological models were elaborated. After probability optimization, most 
of them give relevant quantitative simulation results. Nevertheless, they remain inconsistent with 
their ability to reproduce the whole set of expecting experimental behaviors. It hence confirms 
the support of reasoning rather than just similating that prevents to validate the model using 
few simulations. Furthermore, it emphasizes the need for an appropriate qualitative validation of 
model behaviors prior to apply our probabilistic technique. In this purpose, the biological system 
has been described using a set of original qualitative rules. It allows us to use a formal verification 
technique in a qualitative validation requirement. Therefore our technique appears as a natural 
extension of regular qualitative modeling approaches for extending robust qualitative models toward 
quantitative properties. 
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